View Code

NYC Taxi Demand Forecasting

Time series analysis and forecasting of New York City taxi demand using ETS models in R

Timeline 1 week
Status Completed
NYC Taxi Demand Forecasting Results
ETS Model Forecast - 30-Day NYC Taxi Demand Prediction
R tidyverse forecast tseries lubridate Time Series ETS Models Statistical Analysis

Project Overview

A comprehensive time series forecasting project that predicts New York City taxi demand using historical data from the NYC Taxi and Limousine Commission. This project demonstrates advanced time series analysis techniques using R programming.

Business Impact: Accurate taxi demand forecasting helps optimize fleet management, reduce wait times, and improve overall transportation efficiency in urban environments.

Introduction

Problem Statement: New York City's taxi industry faces challenges in matching supply with fluctuating demand. Accurate forecasting enables better resource allocation and improved customer service.

Dataset: The dataset contains 10,320 observations of taxi demand recorded at 30-minute intervals, including timestamps and demand values ranging from 8 to 39,197 rides per interval.

Methodology: Implemented in R using tidyverse for data manipulation, forecast package for ETS modeling, and tseries for stationarity testing.

Data Preprocessing

Data Transformation

  • Converted timestamp strings to datetime format
  • Aggregated 30-minute interval data to daily level for trend analysis
  • Handled time series conversion with proper frequency settings

Key Libraries Used

# Loading essential R libraries
library(tidyverse)    # Data manipulation and visualization
library(lubridate)    # Date-time operations
library(forecast)     # Time series forecasting
library(tseries)      # Stationarity testing
library(zoo)          # Time series objects

Exploratory Data Analysis

The time series visualization revealed important patterns in NYC taxi demand:

  • Long-term trends with demand fluctuations over time
  • Seasonal patterns with repeating high and low demand periods
  • Short-term spikes potentially due to external factors like weather or events
Visualization: Created comprehensive line plots using ggplot2 to identify trends, seasonality, and anomalies in the daily aggregated data.

Stationarity Testing

Augmented Dickey-Fuller (ADF) Test

  • Null Hypothesis: Time series is non-stationary
  • Result: p-value = 0.0334
  • Conclusion: Reject null hypothesis - series is stationary

KPSS Test

  • Null Hypothesis: Time series is stationary
  • Result: p-value > 0.1
  • Conclusion: Fail to reject null hypothesis - series is stationary
Key Finding: Both statistical tests confirmed the time series is stationary, eliminating the need for differencing or transformations before modeling.

Autocorrelation Analysis

ACF Plot Analysis

  • Showed how current values correlate with past values at different lags
  • Revealed significant autocorrelation at multiple lag periods
  • Helped identify the memory structure of the time series

PACF Plot Analysis

  • Measured direct correlations by removing intermediate lag effects
  • Provided insights into the appropriate model order
  • Supported the ETS model selection process

ETS Model Implementation

Model Selection: ETS(A,N,N)

  • Error: Additive (A)
  • Trend: None (N)
  • Seasonality: None (N)

Model Parameters

# ETS Model Summary
ETS(A,N,N)

Smoothing parameters:
  alpha = 0.9999

Initial states:
  l = 685786.821

sigma: 79791.59
0.9999
Smoothing Parameter (alpha)
8.93%
MAPE
79,419.6
RMSE

Results & Performance

Model Evaluation Metrics

  • RMSE: 79,419.6 - Measures average prediction error magnitude
  • MAPE: 8.93% - Percentage accuracy of forecasts
  • AIC: 6012.162 - Model quality indicator (lower is better)
  • ACF1: 0.0887 - Low residual autocorrelation indicates good fit

30-Day Forecast

Generated a 30-day forecast using the trained ETS model, providing valuable insights for:

  • Fleet management and resource allocation
  • Driver scheduling optimization
  • Demand anticipation for special events
  • Infrastructure planning
Interpretation: The high smoothing parameter (alpha = 0.9999) indicates the model places strong emphasis on recent observations, making it responsive to recent demand changes while maintaining overall trend capture.